In this blog we are going to see about "Databricks Certified Data Engineer Associate" exam details, tips, the key point needs to prepared and crisp explanation. Please go through each and every topics mentioned in this blog and also the link provided in some of the topics. This will help you for sure.
Certification Details in short:
There will be almost 45 questions and that will be of multiple-choice on the certification exam. The below questions distribution will give the overview of the exam.
Databricks Lakehouse Platform – 24% (11/45)
ELT with Spark SQL and Python – 29% (13/45)
Incremental Data Processing – 22% (10/45)
Production Pipelines – 16% (7/45)
Data Governance – 9% (4/45)
You need to focus on below points that are important topics will surely come in exam. I have given some crisp details for some of the points.
Please Focus on some topics for the Exam:
1. What is DESCRIBE DETAIL:
It gives the details of file location, number of files , size in bytes etc
DESCRIBE DETAIL Table_name
2. Rollback:
Even after deleting the table we can restore it.
RESTORE TABLE table_name TO VERSION AS OF 8
In the above example it will fetch back the table which is of version 8
3.Vacuum:
Used to delete the many old files as it is difficult to maintain all the files in Production. But we can give only more than 7 days , That is 168 hours. If we give less than that it wont work. In that case we can enable it by below steps.
SET spark.databricks.delta.retentionDurationCheck.enabled=false;
SET spark.databricks.delta.vacuum.logging.enabled=true
VACUUM Table_name RETAIN 0 HOURS DRY RUN
0 hours will only retain current version.
4.View :
5.Cloning:
Create or replace table table_nameDEEP CLONE table_name2
Deep clone will fully copies data and metadata.
*Shallow Clone:
Shallow clone just copies delta transaction logs and data is not moved.
6. Writing to table:
* Overwrite:
Create or replace table events AS
select * from parquet.'path'
This will overwrites the existing table and accepts schema change, old version can be returned.
* Insert Overwrite :
INSERT OVERWRITE tablename
Select * from parquet.'path'
It is same as overwrite but schema change is not allowed.
7. Count_if:
Based on count_if there might be a question.
Select count_if(user_id is NULL) AS a from table
the above query count the number of user_id which is NULL.
8. Filter in Json:
Select id,FILTER (items, i -> i.item_id like '%k') from t
don't get confuse with above query, it states that in each item of json it checks for 'k' like word in the 'item_id' key/column. Simple as 'like' command in 'where' clause
9. Transform in Json:
Useful when we want to apply an existing function to each element.
10. Incremental load using Auto Loader and Structured Streaming:
11. Multi-Hop Architecture:
12. Delta Live table:
13. Data Governance Overview:
14. Unity Catalog:
15. Permissions:
GRANT usage,create ON CATALOG 'hive-metastore' TO 'users';SHOW Grant ON CATALOG 'hive-metastore'
In addition to this, Once you prepared for the examination, i will recommend to check this to view some key points of Databricks Certified Associate Developer for Apache Spark 3 Exam Preparation as well, this will give some clear point of spark.
Thus in this blog we looked for the key points or the thumb rule topics needs to be prepared for the "Databricks Certified Data Engineer Associate Exam". Hope this helps to clear the exam.
Thank you!!!
0 Comments